The Development of Multimodal Lexical Resources

نویسندگان

James Pustejovsky

Nikhil Krishnaswamy

Tuan Do

Gitit Kehat

چکیده

Human communication is a multimodal activity, involving not only speech and written expressions, but intonation, images, gestures, visual clues, and the interpretation of actions through perception. In this paper, we describe the design of a multimodal lexicon that is able to accommodate the diverse modalities that present themselves in NLP applications. We have been developing a multimodal semantic representation, VoxML, that integrates the encoding of semantic, visual, gestural, and action-based features associated with linguistic expressions. 1 Motivation and Introduction The primary focus of lexical resource development in computational linguistics has traditionally been on the syntactic and semantic encoding of word forms for monolingual and multilingual language applications. Recently, however, several factors have motivated researchers to look more closely at the relationship between both spoken and written language and the expression of meaning through other modalities. Specifically, there are at least three areas of CL research that have emerged as requiring significant cross-modal or multimodal lexical resource support. These are: • Language visualization and simulation generation: Creating images from linguistic input; generating dynamic narratives in simulation environments from action-oriented expressions;(Chang et al., 2015; Coyne and Sproat, 2001; Siskind, 2001; Pustejovsky and Krishnaswamy, 2016; Krishnaswamy and Pustejovsky, 2016) • Visual Question-Answering and image content interpretation: QA and querying over image datasets, based on the vectors associated with the image, but trained on caption-image pairings in the data; (Antol et al., 2015; Chao et al., 2015a; Chao et al., 2015b) • Gesture interpretation: Understanding integrated spoken language with human or avatargenerated gestures; generating gesture in dialogue to supplement linguistic expressions;(Rautaray and Agrawal, 2015; Jacko, 2012; Turk, 2014; Bunt et al., 1998) To meet the demands for a lexical resource that can help drive such diverse applications, we have been pursuing a new approach to modeling the semantics of natural language, Multimodal Semantic Simulations (MSS). This framework assumes both a richer formal model of events and their participants, as well as a modeling language for constructing 3D visualizations of objects and events denoted by natural language expressions. The Dynamic Event Model (DEM) encodes events as programs in a dynamic logic with an operational semantics, while the language VoxML, Visual Object Concept Modeling Language, is being used as the platform for multimodal semantic simulations in the context of human-computer communication, as well as for imageand video-related content-based querying. Prior work in visualization from natural language has largely focused on object placement and orientation in static scenes (Coyne and Sproat, 2001; Siskind, 2001; Chang et al., 2015). In previous work (Pustejovsky and Krishnaswamy, 2014; Pustejovsky, 2013a), we introduced a method for modeling natural language expressions within a 3D simulation environment, Unity. The goal of that work was to

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Uprising in “Uprising”: A Multimodal Analysis of Bob Marley’s Lyrics

This paper investigates how the theme of uprising is conveyed in Bob Marley’s final music album by the name “Uprising”. Through the methodological lenses of multimodality, attention is focused on how the album cover design, lexical items, literary devices, and other aesthetic ways such as the titles of the ten songs of the album and their order of arrangement contribute to the overall theme of ...

متن کامل

A Multi-view Hyperlexicon Resource for Speech and Language System Development

New generations of integrated multimodal speech and language systems with dictation, readback or talking face facilities require multiple sources of lexical information for development and evaluation. Recent developments in hyperlexicon development offer new perspectives for the development of such resources which are at the same time practically useful, computationally feasible, and theoretica...

متن کامل

Manipulation As an Ideological Tool in the Persian Translations of Ervand Abrahamian’s The Coup: A Multimodal CDA Approach

The present Critical Discourse Analysis (CDA) study aimed to explore the probable ideological manipu- lations exerted in three translations of an English political book entitled The Coup by Ervand Abraha- mian. This comparative qualitative study was conducted based on Farahzad‘s three-dimensional CDA model. The textual, paratextual, and ...

متن کامل

Achieving Multimodal Cohesion during Intercultural Conversations

How do English as a lingua franca (ELF) speakers achieve multimodal cohesion on the basis of their specific interests and cultural backgrounds? From a dialogic and collaborative view of communication, this study focuses on how verbal and nonverbal modes cohere together during intercultural conversations. The data include approximately 160-minute transcribed video recordings of ELF interactions ...

متن کامل

Cross-cultural evidence for multimodal motherese: Asian Indian mothers' adaptive use of synchronous words and gestures.

In a quasi-experimental study, 24 Asian Indian mothers were asked to teach novel (target) names for two objects and two actions to their children of three different levels of lexical mapping development: prelexical (5-8 months), early lexical (9-17 months), and advanced lexical (20-43 months). Target naming (n=1482) and non-target naming (other, n=2411) were coded for synchronous spoken words a...

متن کامل

Early Phonological and Lexical Development of a Farsi Speaking Child: A Longitudinal Case Study

The present study aims at the description and analysis of the phonological and lexical development of a child who is acquiring Farsi as his first language. The child's language production at the holophrastic stage of language development, mainly single words, is observed and recorded longitudinally for nearly seven months since he was 16 months old until he turned 23 months. An attempt is mad...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

The Development of Multimodal Lexical Resources

نویسندگان

چکیده

منابع مشابه

Uprising in “Uprising”: A Multimodal Analysis of Bob Marley’s Lyrics

A Multi-view Hyperlexicon Resource for Speech and Language System Development

Manipulation As an Ideological Tool in the Persian Translations of Ervand Abrahamian’s The Coup: A Multimodal CDA Approach

Achieving Multimodal Cohesion during Intercultural Conversations

Cross-cultural evidence for multimodal motherese: Asian Indian mothers' adaptive use of synchronous words and gestures.

Early Phonological and Lexical Development of a Farsi Speaking Child: A Longitudinal Case Study

عنوان ژورنال:

اشتراک گذاری